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Abstract. The independent component model is a latent variable model 
where the components of the observed random vector are linear combinations 
of latent independent variables. The aim is to find an estimate for a transfor¬ 
mation matrix back to independent components. In moment-based approaches 
third cumulants are often neglected in favour of fourth cumulants, even though 
both approaches have similar appealing properties. This paper considers the 
joint use of third and fourth cumulants in finding independent components. 

First, univariate cumulants are used as projection indices in search for inde¬ 
pendent components (projection pursuit). Second, multivariate cumulant ma¬ 
trices are jointly used to solve the problem. The properties of the estimates are 
considered in detail through corresponding optimization problems, estimating 
equations, algorithms and asymptotic statistical properties. Comparisons of 
the asymptotic variances of different estimates in wide independent component 
models show that in most cases symmetric projection pursuit approach using 
both third and fourth squared cumulants is a safe choice. 

1. Introduction 

In this paper we consider the use of third and fourth cumulants in independent 
component analysis (ICA). The basic blind source separation model assumes that 
the observed vectors are linear combinations of some latent unobservable 

variables e i = I,n, the recovering of which is the objective of the analysis. 
If we write 

X = (xi,...,x„)'^ G and Z = (zi,z„)^ G 

we have a semiparametric model 

X = l„/x^ + 

with a shift vector fi and a non-singular transformation matrix or mixing matrix 
n G In the independent component model the columns of Z are assumed to 

be independent, and in the most classical model it is further assumed that the rows 
of Z, that is, Zi,...,z„ are independent and identically distributed, each having p 
independent components. The model has the intuitive interpretation of p hidden 
independent signals, the properties of which we observe only through an unknown 
linear mixing process. 

2000 Mathematics Subject Classification. Primary 62H10; Secondary 62H12. 

Key words and phrases. Skewness, Kurtosis, FastICA, FOBI, JADE. 


1 



2 


THIRD AND FOURTH CUMULANTS IN ICA 


Projection pursuit (PP) is a popular method to reveal hidden structures in the 
data by searching for low-dimensional orthogonal projections of interest. This is 
done by finding one or several linear combinations of the original variables that 
maximize the value of an objective function, the so-called projection index. The 
classical measures of skewness and kurtosis, the third and fourth moments of a ran¬ 
dom variable after standardization, have been widely used for this purpose. Huber 
(1985) considered projection indices with heuristic arguments that non-gaussian 
linear combinations are most interesting. His indices were ratios of two disper¬ 
sion functionals thus measuring kurtosis, with the classical kurtosis measure as a 
special case. Pena and Prieto (2001) used projection pursuit for hidden cluster 
identification, again with the classical kurtosis measure. For early contributions 
on projection pursuit, see also Friedman and Tukey (1974) and Jones and Sibson 
(1987). In the engineering literature, Hyvarinen and Oja (1997) were the first to 
propose a projection pursuit approach for independent component analysis with the 
absolute value of the excess kurtosis, the fourth cumulant of a standardized vari¬ 
able, as a projection index and considered later an extension with a choice among 
several alternative measures of non-gaussianity including the absolute value of the 
classical skewness, namely, the third cumulant of a standardized random variable. 
The approach is called deflation-based FastICA or symmetric FastICA depending 
on whether the independent components are found one-by-one or simultaneously. 
FastICA is perhaps the most popular approach for the ICA problem in engineering 
applications. Recently, Miettinen et al. (2015) surveyed and discussed in detail the 
statistical properties of unmixing matrix estimates based on the use of the absolute 
value of the excess kurtosis as a projection index. 

The concept and measures of kurtosis have been extended to the multivariate 
case as well. The classical skewness and kurtosis measures by Mardia (1970), for 
example, combine in a natural way the third and fourth moments of a standardized 
multivariate variable. Mardia’s measures are invariant under affine transforma¬ 
tions. For other combinations of standardized third and fourth moments, see also 
Mori et al. (1994); Kollo (2008). In the invariant coordinate selection (ICS) (Tyler 
et al., 2009) one finds, using two scatter matrices, an unmixing matrix such that 
the back-transformed variables are presented in an invariant coordinate system, 
standardized and ordered according their (generalized) kurtosis. In independent 
component analysis, certain scatter matrices based on fourth moments and the co- 
variance matrix are used together in a similar way to find the transformations to 
independent components; e.g. fourth order blind identification (FOBI) by Cardoso 
(1989) and joint approximate diagonalization of eigen-matrices (JADE) by Cardoso 
and Souloumiac (1993) are regularly used in independent component analysis. Mi¬ 
ettinen et al. (2015) give a detailed survey of FOBI and JADE estimates with a 
comparison to deflation-based and symmetric projection pursuit estimates that use 
the absolute value of the excess kurtosis as a projection index. Pena et al. (2010) use 
a fourth moment kurtosis matrix to reveal cluster structures in the data. Similarly 
Loperfido (2013, 2015) apply multivariate skewness measures for this purpose. 

Independent component analysis has so far been mainly developed in the engi¬ 
neering literature and seen as a computational tool to decompose a multivariate 
signal into independent non-gaussian signals. The procedures are then considered 
as numerical algorithms rather than estimates of certain population quantities and 
considering their statistical properties has been neglected. Recently, statisticians 
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have become interested in the problem. Chen and Bickel (2006) and Samworth and 
Yuan (2012) for example developed estimates that need only the existence of first 
moments and rely on efficient nonparametric estimates of the marginal densities. 
Efficient estimation methods based on residual signed ranks and residual ranks have 
been developed recently by Ilmonen and Paindaveine (2011) and Hallin and Mehta 
(2015). 

As far as the authors know, this paper introduces for the first time several ICA 
procedures that jointly use third and fourth cumulants. Only in the case of the 
JADE-type approach of Section 5.2 has this been done before, see Moreau (2001). 
First, weighted sums of squared third and fourth cumulants are used as projection 
indices in search for independent components (deflation-based and symmetric PP). 
In most cases, our estimates then outperform the classical FastICA estimates that 
use either absolute values of the third cumulants or absolute values of the fourth 
cumulants. Second, multivariate third and fourth cumulant matrices are jointly 
used to find an unmixing matrix estimate. Our approach is again novel in the sense 
that it uses also the multivariate third cumulant matrices. The classical FOBI 
and JADE estimates are found as special cases. The properties of the estimates 
are considered in detail through corresponding optimization problems, estimating 
equations, algorithms and asymptotic statistical properties. Comparisons of the 
asymptotic variances of different estimates in wide independent component models 
with skew, heavy- and light-tailed marginal distributions show that in most cases 
symmetric projection pursuit approach using third cumulants only outperforms its 
competitors. 

The paper is structured as follows. We first introduce some helpful notation in 
Section 2. After introducing the independent component (IC) model with relevant 
assumptions in Section 3, the unmixing matrix estimates based on the projection 
pursuit approach and those based on the multivariate cumulant matrices are dis¬ 
cussed in detail in Sections 4 and 5, respectively. In Section 6 the procedures are 
first compared in the case of cluster identification (using only one independent com¬ 
ponent) and then in the general case of p independent components. We end with 
some discussion on the results and their importance in Section 7. The proofs are 
reserved for the Appendix. 


2. Notation 

For a univariate random variable x, we write Xst = {x — E{x))/ar{x) for its 
standardized version. The classical skewness, kurtosis and excess kurtosis of x are 
then 

7 (x) = E {xlt) , /3(x) = E {x%) and k{x) = /3(x) - 3. 

Note that the measures "/{x) and k{x) are the third and fourth cumulants of the 
standardized variable Xst- For symmetrical random variables 7 ( 2 ;) = 0 and for the 
normal distribution k{x) = 0 . 

Throughout the paper we assume that Zi, ...,z„ is a random sample from a p- 
variate distribution of z with E{z,) = 0 and Cou(z) = Ip and that the p components 
of z are mutually independent. As different moment-based quantities play a crucial 
role in our derivations, we have the shorthands 

=: 7 fe, E{zff^) =: /3k and E{zf^) - 3 =: k*,, k= I, ...,p. 
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For all k = 1, the moment-based expressions 

Eizfk) - =: uJk, E{ztk) - I =■. vu and Fl(zffe) - E{zlk) =■ Vk 

are encountered numerous times in the expressions for the asymptotic variances of 
our estimates and thus deserve symbols of their own. The limiting distributions of 
our unmixing matrix estimates depend on the joint limiting distributions of 
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Central limit theorem can be used to prove the joint limiting multinormality of 
these statistics with the variances and covariances as listed in Table 1. 
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For a p-variate random vector x with mean vector /l and covariance matrix 
S, the standardized vector is — fi), where is chosen as the 

symmetric matrix G satisfying GSG = Ip. A useful result (see for example Ihnonen 
et al. (2012)) regarding the standardized observations is that if x* = Ax-|-b, then 
x*( = Uxs 4 , for some orthogonal matrix U € This fact is used in proving the 

affine equivariances of the different functionals later on. Additionally, the centered 
observations are in the proofs denoted with := — z for clarity. 

The standard basis vectors of are denoted by G Rp. That is, the jth 
element of e.i is equal to Kronecker’s delta 6ij = I{i = j). Using the standard basis 
vectors we further define the following matrices 

F,^^=e,eJ, i,j = l,...,p, 

the only non-zero element of E*-* being the element (i,j). Finally, some often 
encountered sets of matrices are denoted with symbols of their own: 

• U = {U G RP^P : U is an orthogonal matrix.} 

• J = {J G : J = diag{ji, ..., Jp), ji, = ±1} 

• V = {D G RP^P : D = diag{di ,..., dp), di ,..., dp > 0} 

• V = {P G RP^P : P is a permutation matrix.} 
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3. Independent component model 


The model used throughout the paper is the independent component model (IC 
model), in which the p-variate observations Xi, are thought to originate as 

(1) Xi=/L + rizi, 

where the unobserved, independent and identically distributed vectors = {zn,Zip 
satisfy the following three assumptions. 


Assumption 1. Zii,...,Zip are standardized and mutually independent. 
Assumption 2. At most one of Zn,Zip is normally distributed. 

The conditions E(zik) = 0 and £’( 2 ^^,) = 1, k = in Assumption 1 just 

serve as identification constraints for the location p and the lengths of the rows of 
n. Then 

£(xi) = p and Cov(xi) = Ti ~ 

To see why Assumption 2 has to hold, consider the case z^ ~ A/ 2 ( 0 ,Ip). Then any 
orthogonal transformation preserves the distribution of z^, that is, z^ ~ Uz^ for all 
\J G U, and we can recover the original z^ only up to some orthogonal matrix U. 
Regarding the uniqueness of the independent components after our assumptions, it 
is easy to see that the signs and the order of the independent components are not 
fixed in the model. This, however, is satisfactory in most applications. 

Additionally, we introduce the following six assumptions, each of which is a 
stricter version of Assumption 2 and implicitly assumes that the third and fourth 
moments exist. This hence rules out heavy-tailed distributions. The relevance of 
these assumptions will become apparent in later discussions on the existence and 
properties of different unmixing matrix functionals. Recall that 

-ik = l{zrk) = E{z%) and Kk = K{zik) = E{zf^.) - 3, k = l,...,p. 


Assumption 3. 
Assumption 4. 
Assumption 5. 
Assumption 6. 
Assumption 7. 
Assumption 8. 


At most one of ji, ...,^p is zero. 

At most one of ki, ..., Kp is zero. 

71 ,..., 7 p are distinct. 

Ki,...,Kp are distinct. 

For at most one k, jk = ^k = 0. 

There is no k ^ I such that jk = 7i cind Kk = ki. 


Assumption 3 is often considered to be much more restrictive than Assumption 
4 as it limits the number of symmetric sources to one. The assumption of sym¬ 
metric sources is made in Ilmonen and Paindaveine (2011). Their approach allows 
however heavy-tailed distributions as the existence of moments is not assumed. 
Note also that Assumptions 5, 6 and 8 rule out components with identical marginal 
distributions. 

The structure of the assumptions is depicted in Figure 1. From the graph we 
again see that all the “moment-based assumptions” are stronger than Assumption 
2 and the most stringent amongst them are Assumptions 5 and 6 . 

Next we state one of the key results of independent component analysis, the 
proof of which can be found, e.g., in Miettinen et al. (2015). 
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Figure 1. The relationships and implications between the differ¬ 
ent assumptions on the independent components. 

Theorem 3.0.1. Let x € follow the independent component model in (1). 
Then the standardized observations Xgt = /x) satisfy z= Uxgt for some 

orthogonal matrix U. 

Theorem 3.0.1 essentially states that the estimation of the unmixing matrix 
can in fact be reduced to a simpler task, namely to the estimation of an orthogonal 
matrix U. This result is used repeatedly in the following sections. 

Finally, we define the independent component functional W(F) as follows. 

Definition 3.0.1. The functional 1F(F) € is said to be an independent com¬ 

ponent functional if (i) W{Fg.)x has independent components under the independent 
component model (1) and (ii) W{Fg.) is affine equivariant in the sense that for all 
X, all full-rank A S and b G there exist P G V and J G ff sueh that 

W{FA^+b)Ax= PJW{F^)x. 

Note that the functional W(F) is defined at any F and is required to be Fisher 
consistent to up to permutation and heterogeneous sign-changes of the rows. 
The functional W(F) is affine equivariant and therefore provides a transformation 
to an invariant coordinate system (ICS), that is, it is also an ICS funetional; see 
Tyler et al. (2009) and Ilmonen et al. (2012). Let next be the empirical cumu¬ 
lative distribution function from a random sample Xi,..., x„ from F. Then W(F„) 
provides a natural affine equivariant estimate of W(F). The affine equivariance 
property simplifies the derivation of the asymptotic behavior of W(F'„) consider¬ 
ably as we may restrict our attention to the case D = Ip only. 

Finally, note that Assumption 2 guarantees that the estimated vector of indepen¬ 
dent components is indeed equal to z up to sign and order, that is, all independent 
component functionals W(Fx) lead to the same independent components up to sign 
change and permutation; see the Ghurye-Olkin-Zinger characterization theorem in 
Ibragimov (2014). 

4. Univariate third and fourth cumulants 

We first consider the use of univariate third and fourth cumulants in estimating 
the unmixing matrix, leading in old and new variants of the so-called deflation-based 
FastICA and symmetric FastICA. 

4.1. Estimating the components separately. First, to actually guarantee the 
validity of our approach, we prove the following inequality, an extension of Theorem 
2 in Miettinen et al. (2015). 
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Theorem 4.1.1. Let z G have independent components with E{z) = 0 and 
Cov{z) = Ip. Then 

+ a 2 K^{u^z) < max ( 017 ^ + 02 ^^) > 

l<fe<p ' ' 

for all ai, 0.2 & K’*' U {0} and for all vectors it S satisfying vFu = 1. 

The inequality in Theorem 4.1.1 implies that the independent components can 
be recovered by repeatedly searching for mutually orthogonal vectors u maximizing 
the projection index 

ai7^(u'^Xs4) + a2'«^(u^Xst) 

and we give the following. 

Definition 4.1.1. The deflation-based projection pursuit functional based on squared 
third and fourth cumulants is a functional W{Fx) = where S = Cov(x) 

and the rows of the orthogonal matrix U = (tii,..., Up)"^ are found one-by-one, such 
that 

Uk= argmax {a^'^{ulXst)(1 - a) k'^{ ulx^t)) , 

ufuj=Skj,l<j<k 

where a S [0,1] is the proportion of weight given to third cumulants. 

Note that weights ai and 02 and weights ai/(ai + 02 ) and 02/(01 + 02 ) in 
Theorem 4.1.1 lead to the same optimization problem, and we may without loss 
of generality use just a single weight parameter a = ai/(ai + 02 )- An interesting 
choice is a = 0.8 corresponding to 

7 ^(u'^Xst) ^^(u^Xst) 

6 24 

as we are then maximizing the value of a functional that is often used to test 
for univariate normality (see Jarque and Bera (1987)). Note also, that choosing 
either a = 0 or a = 1 makes the proposed method equivalent to the so-called 
deflation-based FastICA (Hyvarinen, 1999) with the projection indices | 7 (u^Xsj)| 
and |/c(u^Xst)|, respectively. For general results concerning deflation-based Fas¬ 
tICA using absolute values see also Ollila (2010); Nordhausen et al. (2011); Miet- 
tinen et al. (2014a). 

The affine equivariance of the procedure given in Definition 4.1.1 follows simply 
from the fact that the optimization problem along with the constraints is invariant 
under mappings x^j i-A- Vxgt, where 'V € U. (Recall that the transformation 
X —>■ Ax-fb induces the transformation Xgt — t Vxg^ for some orthogonal V.) This 
together with Theorem 4.1.1 implies the following. 

Lemma 4.1.1. The deflation-based projection pursuit functional W{Ff) in Defi¬ 
nition j.1.1 is an independent component functional for every a € [0,1]. 

The Lagrangian of the maximization problem involving uj, has the form 

L(ufe, Afc) =a(F [(u^x,t)3] -b (1 - a) (F [(u^Xst)'‘] - 3)^ 

k -1 

- X! - Afcfc(ujufc - 1). 

i=i 
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First differentiating w.r.t. u/j and the Lagrangian multipliers and then solving 
for the Lagrangian multipliers and substituting them back in yields the following 
estimating equation for the fcth row u^. 

Tfc = 0 

where 

Tfc = 3aE [(UfeX^t)^] E [{ulxstfxst] 

+ 4(1 - a) {E [(ul’xst)^] -3)E [(u^Xst)3x^4] . 

After finding we then obtain a fixed-point solution for Uj. by succes¬ 

sively iterating over the the following steps. 

(1) Ufc ^ (^Ip - UjuJ^ Tfc. 

(2) Ufc ^ ||ufc||-iufc. 

A Newton-Raphson type algorithm for this problem might be more efficient and 
will be considered in a separate paper. 

Additionally, the estimating equations provide us with the following results re¬ 
garding the asymptotic behavior of the unmixing matrix estimates W in the case 
O = Ip. Note that, this is sufficient as all estimates are affine equivariant. The 
general case easily follows. 

Theorem 4.1.2. (i) Let Zi,...,Zn be a random sample from a distribution with 
finite sixth moments and satisfying assumptions 1 and 3. Then there exists a se¬ 
quence of solutions based on skewness (that is, a = such that W^p Ip and 

Vnwki = -y/nwik - Vnski op(l), I < k, 

Vn{wkk - 1) = -^Vn{skk - 1) + op(l), 

Vnwki = + op(l), I > k, 

'^k 

where i>iki = ikfki - llski- 


(ii) Let Zi,..., be a random sample from a distribution with finite eighth mo¬ 
ments and satisfying assumptions 1 and 4- Then there exists a sequence of solutions 
based on kurtosis (that is, a = 0) such that W Ip and 

Vnwki = -y/nwik - Vnski + op(l), I < k, 

'Jn{wkk - 1) = V^(sfcfc - 1) + op(l), 

Vnwki = l>k, 

i^k 

where ^ 2 ki = Hkqki - KkPkSki- 

(Hi) Let Zi,..., z„ be a random sample from a distribution with finite eighth mo¬ 
ments and satisfying assumptions 1 and 1. Then there exists a sequence of solutions 
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based on both skewness and kurtosis such that W Ip and 


Vnwki 

Vn{wkk - 1) 


Vnwki 


-^/nwik - Vnski + op(l), I < k, 


- 1) + Op(l), 

Za^Jri'ipiki + 4(1 - a)^ii2ki 

3a7|+4(l-a)«;2 


op(l)) 


I > k, 


where a G [0,1] is the proportion of weight given to skewness, and ifiki is as in (i) 
and ip 2 ki as in (ii). 

Corollary 4.1.1. (i) Under the assumptions of Theorem 4-1. 2(i) the limiting dis¬ 
tribution of ^/nvecifW — Ip) is multivariate normal with mean vector 0 and ele¬ 
mentwise variances 


ASV{wki) = ^ + l, l<k, 

ASV{wkk) = 

ASV[wki) = ^, l>k, 

Ik 

where Cii* = lliyk - iD- 


(ii) Under the assumptions of Theorem 4A.2(ii) the limiting distribution of 
\/nvec{ W-Ip) is multivariate normal with mean vector 0 and elementwise vari¬ 
ances 


ASV {wki) 

ASV{wkk) 

[wki) 


A “ -L? 


Pfe + 2 

4 ’ 

(,22k 


A ’ 


I < k, 


I > k, 


where ( 22 k = sii{uJk - Pi). 


(Hi) Under the assumptions of Theorem 4.1^(i'l'l) the limiting distribution of 
\/nvec{ W-Ip) is multivariate normal with mean vector 0 and elementwise vari¬ 
ances 


ASV {wki) 
ASV(wkk) 
ASV {wki) 


Qo^Cih + 16(1 - a)^C22i + 24a(l - a)Ci2Z 

(3a7f+ 4(l-a)A2)2 

Kfc + 2 

4 ’ 

9a^Ciifc + 16(1 — cTf(22k + 24q;( 1 — a)Ci2fe 
(8072+4(1 - a)K^)2 


where a G [0,1] is the proportion of weight given to skewness, and Ciifc is as in (i), 
( 22 k as in (ii) and ( 12 k = IkHkiVk - IkPk)- 
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4.2. Estimating the components simultaneously. As in Section 4.1, we first 
provide the justification for the validity of our approach in the form of the following 
inequality. 

Theorem 4.2.1. Let z G have independent components with E{z) = 0 and 
Cov{z) = Ip. Then 

+ (1 - a)^K^{ulz) < + (1 - a)'^Kl, 

k=l k=l k=l k=l 

for all orthogonal matrices U = (tti,..., Up)^ G R^^^ and for all a G [0,1]. 

The inequality in Theorem 4.2.1 suggests the following strategy for searching for 
the independent components. 

Definition 4.2.1. The symmetric projection pursuit functional based on squared 
third and fourth cumulants is a functional W{Fgf) = , where S = Cov{x) 

and the rows of the orthogonal matrix U = (ui ,..., itp)^ are found simultaneously, 
such that 

U = argmax a ^"^(v^Xst) + (1 — a) 

V 

where a G [0,1] is the proportion of weight given to third cumulants. 

Recall that in the classical symmetric fastICA approach utilizing third or fourth 
cumulants one finds U that maximizes either 'Sk=i l^(UfeXst)|. 

We thus use squares instead of absolute values and both cumulants simultaneously. 
See also Wei (2014); Miettinen et al. (2015) for more details on the approach using 
absolute values. 

In Comon (1994) the projection indices that satisfy inequalities such as in The¬ 
orem 4.2.1 are called contrasts, see also Moreau (2001). Both papers also show 
that in general any cumulants of order 3 or higher can be used in independent 
component analysis as contrasts. 

It is easy to see that the functional in Definition 4.2.1 is affine equivariant and 
Theorem 4.2.1 implies the following. 

Lemma 4.2.1. The deflation-based projection pursuit functional W{Fg;) in Defi¬ 
nition 4-2.1 is an independent component functional for every a G [0,1]. 

The Lagrangian of the maximization problem in Definition 4.2.1 has the form 
L(U, A) = a^(E [(u^x,,)'])' + (1 - a) (A [(u^x,*)"] - 3)" 

k^l k^l 

p— Ip p 

- ^Afcfe(u|’ufc - 1). 
k—1 l—k-\-l k—1 

First differentiating w.r.t. U and the Lagrangian multipliers in A and then noticing 
that the multipliers have two solutions that must be equal, we get equations 
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where again 

Tfe = 2>aE [(UfcXst)^] E [(u^Xst)^X5t] 

+ A{1 - a) {E - 3) E [{ulyistfy^st] , k = l,...,p. 

If we then write 

T = (Ti,...,Tp)^ 
we get, as in Miettinen et al. (2015), the following. 

Lemma 4.2.2. The estimating equations for U in Definition 4-2.1 are 

UT^ = TU^ and UU^ = Ip 
or, equivalently, U= r(T^T)“^/^. 

The estimating equations then suggest a fixed-point algorithm with a step 

U ^ T(T^T)-i/2. 

and further provide the following results regarding the asymptotic behavior of the 
estimate W in the case = Ip. . 

Theorem 4.2.2. (i) Let Zi,...,Zn be a random sample from a distribution with 
finite sixth moments and satisfying assumptions 1 and 3. Then there exists a se¬ 
quence of solutions based on skewness (that is, a = such that W^p Ip and 

Vn{wkk - 1) = -^Vn{skk - 1) + op(l), 

Vnwki = +op(l), If^k, 

% + if 

where -ipiki = Jkfki - link - ifski- 


(ii) Let Zi,Zn be a random sample from a distribution with finite eighth mo¬ 
ments and satisfying assumptions 1 and 4- Then there exists a sequence of solutions 
based on kurtosis (that is, a = 0) such that W Ip and 

Vn{wkk - 1) = -^V^(sfefc - 1) + op(l), 

Vnwki = +op(l), If^k, 

+ «r 

where '^ 2 ki = Hkqki - Kiqik - (nkPk - 5Ki)ski- 

(Hi) Let Zi,Zn be a random sample from a distribution with finite eighth mo¬ 
ments and satisfying assumptions 1 and 7. Then there exists a sequence of solutions 
based on both skewness and kurtosis such that W ^p Ip and 

y/n{wkk - 1) = -^Vnihk - 1) + op(l), 

_ Say/nipiki -I- 4(1 - a)y/ri'ip 2 ki ...^ . 

3a(7fe+7f)+4(l-a)(4 + '«?) 

where a G [0,1] is the weight given to skewness, and fjiki is as in (i) and 'ijj 2 ki as 
in (ii). 
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Corollary 4.2.1. (i) Under the assumptions of Theorem f.2.2(i) the limiting dis¬ 
tribution of ^Jnvec(W — Ip) is multivariate normal with mean vector 0 and the 
following asymptotic variances. 

ASV{wkk) = 

ASV{WUI)= . 2 !“ 2^2 ' 

{Tk+lt) 

where Cii = ll{i^k - ll) + - ^f) + if- 

(ii) Under the assumptions of Theorem the limiting distribution of 

^/nvec{ W-Ip) is multivariate normal with mean vector 0 and the following as¬ 
ymptotic variances. 

ASV{wkk) = 

^SV{wki) = ^^^^^, k^l, 
where C 22 = ^li^k - Pi) + - /3f) + ref. 


(Hi) Under the assumptions of Theorem 4.2.2(iii) the limiting distribution of 
y/nvec{ W-Ip) is multivariate normal with mean vector 0 and the following as¬ 
ymptotic variances. 


ASV{wkk) 
ASV (wki) 


Afc + 2 

4 ’ 

9q;^Cii + 16(1 - a)^C22 + 24 q;(1 - q;)Ci2 . ^ . 

{Mll+lf)+^i^-a){^l+^fW ’ ^ ’ 


where a G [0,1] is the weight given to skewness, and Cn is as in (i), C 22 as in (ii) 
and C 12 = IkKkiVk - IkPk) + liKiim - liPi) + ifi^i- 


5. Multivariate third and fourth cumulants 

As the previous methods of estimating the unmixing matrix utilized only the 
marginal third and fourth cumulants of the components, a natural question is 
whether the use of multivariate moments has any benefits. We therefore consider 
the following sets of matrices, capturing all joint third and fourth cumulants of the 
random p-vector x with E(x) = 0. 

C^*(x) = E[xi-xx^], i=l,...,p, and 

C^*l(x) = E [xiXj ■ xx^] — E{xiXj)E(x.x^) 

— E{xi ■ x)E{xj ■ x^) — E{xj ■ x)E{xi ■ x^), i,j = l,...,p. 

Evaluating matrices C^*(z) and gives 

C^\z) = 7 ,E" and C‘‘*l(z) = 

showing that both C^*(z) and C^*l(z) are diagonal for all i,j = 1, ...,p. Based upon 
them we can construct two matrices combining specific subsets of third and fourth 
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joint cumulants, which we will call compound cumulant matrices: 

P P p p 

C3(z) = ^C3*(z) = and C^(z) = ^ C^"(z) = ^ 

i=l i-1 i=l i=l 

The next theorem then gives us two viable ways of recovering the independent 
components using the previously defined cumulant matrices. 

Theorem 5.0.3. Let z G have independent components with E{z) = 0 and 
Cov{z) = Ip. Then for all orthogonal U, the eigenvectors of the symmetric matrices 
C'^\Uz), C*^^{Uz), C^{Uz) and C^{Uz), i,j = l,...,p are the columns of U. 

Theorem 5.0.3 says that the rotation giving the independent components from 
the standardized observations is such that it diagonalizes all the matrices C^(xs(), C^(xst), C^*(xst) 
and C'‘*'^(xst), i,j = To recover the independent components in practice 

we thus want to find a rotation IJ G U that simultaneously makes all the matri¬ 
ces UCgU^, where Cg, s = is some subset of the previous matrices, as 

diagonal as possible. 

One way of accomplishing this is based on the observation that for any family 
of matrices Cg, s = 1,..., S, and any XJ Gld we have 

s s s 

^ ||diag(UCgU^)f + ^ ||off(UCgU^)f = ^ IlCgf, 

S=1 S=1 

implying that the joint approximate diagonalization can be preformed by finding 
\J Ghl that maximizes the sum X]f=i l|diag(UCgU^)p. The process can then be 
thought as a sort of “joint eigendecomposition”. The concept is not new and has 
been used before e.g. in Cardoso and Souloumiac (1993) and in Moreau (2001). 

5.1. Using compound cumulant matrices. Having already justified the work¬ 
ing of the following methods, we first present the use of compound cumulant ma¬ 
trices and in recovering the independent components. 

In order to obtain an affine equivariant procedure this time, we must use a 
somewhat unorthodox standardization. Namely, we pretransform the data by an 
arbitrary independent component functional. This is necessitated by the “bad 
behavior” of the compound matrix of third cumulants C^. Writing the IC functional 
in the form for some U* G U, makes the standardization 

then correspond to the transformation x i—> = (S*)- ■^/^(x — p). Note that 

= Ip so that is just a certain asymmetric version of 

S~ ' . Surprisingly, the limiting behavior of the estimates then does not depend 
on the root-n consistent choice of Note that this idea to achieve affine 

equivariance for another IC method was also used in Miettinen et al. (2013). 

Defiuitiou 5.1.1. The compound cumulant functional based on both third and 
fourth cumulants is a functional W{Fj:) = i7(S*)“^/^, where (S*)“^/^ is the stan¬ 
dardizing IC functional and the orthogonal matrix U is found as 

U= argmax (a\\diag{UC^{x*t) C/'^)f + {1 - a)\\diag{UC‘^{x*t)U^)\\^) , 
ueu ^ ' 

where a G [0,1] is the proportion of weight given to skewness. 

Letting then a = 1 or a = 0 and using the properties of the matrices and 
yields the following corollary. 
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Corollary 5.1.1. (i) The compound cumulant functional based on third cumulants 
(that is, a = \) is a functional W{F„c) = where U has the eigenvectors 

of as its rows. 

(ii) The compound cumulant functional based on fourth cumulants (that is, a = 0) 
is a functional W{Fx) = where U has the eigenvectors of C‘^{x*f) as 

its rows. 

As already stated, the different standardization mechanism guarantees the affine 
equivariance of the procedure. 

Lemma 5.1.1. The compound cumulant functional W{Fx) in Definition 5.1.1 is 
an independent component functional for every a S [0,1]. 

Remark 5.1.1. We implicitly assume here that the IC functional used in the stan¬ 
dardization exists and is well-defined. The extra assumptions needed for its exis¬ 
tence and root-n consistency can be seen as the price we have to pay for making the 
compound cumulant method affine equivariant. 

Remark 5.1.2. If a = 0 and only Cf" is used, the prestandardization is not needed 
and the classical FOBI estimate is obtained as the solution. The most natural choice 
for is then the FOBI functional. 

The Lagrangian of the objective function in the maximization problem of Defi¬ 
nition 5.1.1 then has the form 

L(U, A) = a^(u^C^Ufc)2 -h (1 - 

k^l k^l 

p—1 p p 

^ Afcfc(u^Ufe - 1). 

k—1 l—k-\-l k—1 

(The matrices and C'^ are evaluated at x*^.) The optimization can be done as 
in Section 4.2, and the estimation equations are 

UT'^ = TU^ and = Ip 

where T = (Ti,..., Tp)^ with 

Tfe = a{ulC^Uk)C^Uk + (1 - a){ulC^Uk)C^Uk, k = 1, ...,p. 

The estimating equations again suggest a fixed-point algorithm and can be used to 
find the asymptotic behaviors of the estimates. We then have the following. 

Theorem 5.1.1. (i) Let Zi,...,z„ be a random sample from a distribution with 
finite sixth moments and satisfying assumptions 1 and 5. Assume further that 
— Jp) = Op(l). Then there exists a sequence of solutions based on 
skewness (that is, a = 1) such that W Ip and 

y/n{wkk - 1) = -^Vn{skk - 1) + op(l), 

Vnwki = -i-op(l), kf^l, 

Ik - 7/ 

where 'tfiu = fu + Hfe + J2m^k,i ^rnki - Ikhi- 
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(ii) Let Zi ,be a random sample from a distribution with finite eighth mo¬ 
ments and satisfying assumptions 1 and 6. Assume further that y/n — Jp) = 

Op(l). Then there exists a sequence of solutions based on kurtosis (that is, a = 0) 
such that WIp and 

Vn{wkk - 1) = -^Vn{skk - 1) + op(l), 

Vnwki = kf^l, 

Kk - Kl 

where ii2ki = Qki + qik + J2m^k,i ^rnki - {Kk + P + 4)sfc/. 


(Hi) Let Zi,Zn be a random sample from a distribution with finite eighth mo¬ 
ments and satisfying assumptions 1 and 8. Assume further that yfii, — Jp) = 

Op(l). Then there exists a sequence of solutions based on both skewness and kurtosis 
such that WIp and 


Vniuikk - 1) 


s/nwki 


-]^Vn{skk - 1) + op(l), 

a{lk - li)y/nipiki + (1 - 

a(7fc - 7/)^ + (1 - a)iKk - kiY 


op(l), kf^l, 


where a S [0,1] is the proportion of weight given to skewness, and ifiki is as in (i) 
and ip 2 ki as in (ii). 


Corollary 5.1.2. (i) Under the assumptions of Theorem 5.1.1(i) the limiting dis¬ 
tribution of ytnvec{W — Ip) is multivariate normal with mean vector 0 and the 
following asymptotic variances. 

ASVfwkk) = 

ASV{wki)= , ,, , k^l, 

where Cii = {vk - ll) + {vi - 'yf) + 'yf + (p - 2). 

(ii) Under the assumptions of Theorem 5.1.1(ii) the limiting distribution of 
\/nvec{ W-Ip) is multivariate normal with mean vector 0 and the following as¬ 
ymptotic variances. 

ASV{wkk) = , 

ASV{wki)= , ,, , k^l, 

[Kk — Kl) 


where C 22 = (wfc - fil) + (uJl - fif) + K^ + Y^m^k.lil^rn - 1). 

(Hi) Under the assumptions of Theorem 5.1.1 (Hi) the limiting distribution of 
y/nvec{ W-Ip) is multivariate normal with mean vector 0 and the following as¬ 
ymptotic variances. 


ASV{wkk) 

{wki) 


Kfc + 2 
^ ’ 

+ (1 - <y)^blkiC.22 + 2 q ;(1 - 
{o^^iki + (1 - 


a)5iki52kifi2 


kfi^l, 
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where Siki = (7fc — Ji), = (^fc — ki), a G [0,1] is the proportion of weight given 
to skewness, and Cn is as in (i), C 22 as in (ii) and C 12 = {rjk — IkPk) + {vi ~ liPi) + 
I'm- 

By noting in Corollary 5.1.2(ii) that {/3m — 1) > 0, m = 1, ...,p we further get a 
lower bound for the corresponding asymptotic variance. 

Corollary 5.1.3. The asymptotic variance of the classical FOBI in 5.1.2(ii) has 
a lower bound of 

{uJk - I3l) + juJi - Pf) + Kf 

(Kfe - KlY 

5.2. Using all cumulant matrices. Besides the issue of achieving affine equiv- 
ariance, another clear drawback of using the compound cumulant matrices and 
in the estimation of the unmixing matrix is that these matrices combine only 
certain subsets of all p^ or p'* possible joint third or fourth cumulants. As such, 
the compound cumulant method may not use all the information available in joint 
cumulants. 

A standard solution used in the literature (for fourth cumulants) is to, instead 
of using only the matrix C"^, use all the cumulant matrices simultaneously. 

This approach, called JADE, was introduced in Cardoso and Souloumiac (1993). 
For further details and variants, see also Bonhomme and Robin (2009); Miettinen 
et al. (2013). We give the following. 

Definition 5.2.1. The functional based on all third and fourth cumulants is a 
functional W{Fg,) = , where S = Cov{x) and 

( p p p 

a y \\diag{ + (1 - a) V V \\diag{ f/^)f 

where and are evaluated at x^t and a G [0,1] is the proportion of weight 
given to skewness. 

For 0 = 0, the classical JADE estimate is obtained. Moreau (2001) has a 
similar definition for his eJADE estimate, the matrices UC^*(xst)U" replaced 
by C^*(Uxs 4 ). Miettinen et al. (2015) proved that the classical JADE functional 
(a = 0) is affine equivariant. This is true for the combined functional W(Ex) as 
well and we have the following. 

Lemma 5.2.1. The functional W{Fx) in Definition 5.2.1 is an independent com¬ 
ponent functional for all a G [0,1]. 

The Lagrangian of the maximization problem in Definition 5.2.1 has the form 

L(U, A) = a y y (u^C3*Ufc)2 + (1 - a) y y y (urc4bufc)2 

k—1 2—1 j — 1 k—1 

p— Ip p 

-y y -yAfefc(yufc-1). 

k—1 l—k-\-l k—1 

Optimization provides the estimating equations 

UT"^ = TU^ and UU"^ = Ip 
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where T = (Ti,Tp)^ now with 

Tfc = a^(u|’c3*Ufc)C3*Ufe + (1 - a)^^(u^C4*^Ufe)C4*^Ufc. 

i—1 i—1 j—1 

As in previous sections we then obtain the following. 

Theorem 5.2.1. (i) Let Zi,...,Zn be a random sample from a distribution with 
finite sixth moments and satisfying assumptions 1 and 3. Then there exists a se¬ 
quence of solutions based on skewness (that is, a = such that W^p Ip and 

y/n{wkk - 1 ) = -^V^(Sfefc - 1 ) + 0 _p(l), 

^/nwki = +op(l)^ l¥^k, 

Tk + if 

where ffiki = IkCki - lifik - ifski- 


(ii) Let Zi,Zn be a random sample from a distribution with finite eighth mo¬ 
ments and satisfying assumptions 1 and 4- Then there exists a sequence of solutions 
based on kurtosis (that is, a = Q) such that W Ip and 

Vn{wkk - 1) = -^Vn{skk - 1) + op{l), 

Vnwki = ^^^^2 +Op(l)> 

+ «r 

where ii 2 ki = HkQki - Kimk - (nkPk - SKi)ski- 


(Hi) Let Zi,Zn be a random sample from a distribution with finite eighth mo¬ 
ments and satisfying assumptions 1 and 1. Then there exists a sequence of solutions 
based on both skewness and kurtosis such that W ^p Ip and 

\/n{wkk - 1) = -^V^(sfefc - 1) + op(l), 

_ ay/nipiki + (1 - a)\/nf!2ki , ,, 

«{7fe+7;^) + (l-a)(K^ + K2) 

where a € [0,1] is the proportion of weight given to skewness, and ipiki is as in (i) 
and ip 2 ki as in (ii). 


Corollary 5.2.1. (i) Under the assumptions of Theorem 5.2.1(i) the limiting dis¬ 
tribution of y/nvec{ W — Ip) is multivariate normal with mean vector 0 and the 
following asymptotic variances. 

ASV{wkk) = 

where Cn = 7fe(j^fc - ll) + ifi^i - if) + if ■ 


k^l, 
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(a) Under the assumptions of Theorem 5.2.1(ii) the limiting distribution of 
\/nvec{ W-Ip) is multivariate normal with mean vector 0 and the following as¬ 
ymptotic variances. 

ASV{wkk) = 

ASV{wki)= ,^l\, k^l, 

where C 22 = - /3^) + Kf{u}i - Pf) + ref. 


(Hi) Under the assumptions of Theorem 5.2.1(iii) the limiting distribution of 
y/nvec{ W-Ip) is multivariate normal with mean vector 0 and the following as¬ 
ymptotic variances. 


ASV{wkk) 
ASV (wki) 


Kfc + 2 
^ ’ 

«^Cii + (1 - «)\22 + 2q;(1 - a)Ci2 

(a(7fc + 7?) + (1 - 


where a € [0,1] is the proportion of weight given to skewness, and Cii is as in (i), 
C 22 as in (a) and C 12 = Ik^-kiTk - IkPk) + im{m - liPi) + 


Comparison of the asymptotic variances in Corollary 4.2.1 and Corollary 5.2.1 
immediately yields the following result. 


Corollary 5.2.2. (i) If a = Q or a = 1, the asymptotic variances of the estimates 
based on symmetric projection pursuit are the same as those based on all cumulant 
matrices. 

(ii) The asymptotic variances of the estimates based on symmetric projection pursuit 
are the same as those based on all cumulant matrices if their respective weights as 
and aj satisfy as = 4q;j/(3 + aj). 


6. Comparison of the estimates 

6.1. Projection pursuit for cluster identification. Given that all the methods 
allow tuning in the form of the weighting parameter a, a natural question is whether 
there exists some optimal choice of weighting for any particular pair of source 
distributions. We approach this question first in the context of cluster identification. 
This approach is not new, for both skewness and kurtosis have been used before 
for similar purposes. In Jones and Sibson (1987) the authors use approximative 
techniques to find a linear combination of squared skewness and kurtosis to use as 
entropy index in projection pursuit. In Pena and Prieto (2001) the authors project 
the data into directions that have extremal kurtosis in hopes of discovering clusters. 

For the model, assume that the vector of independent components z is a mixture 
of two multivariate normal distributions z*, namely 

z* ~ TT •Wp(0,lp) + (1 - tt) • A/'p(/rei,Ip), 

standardized to have zero mean and identity covariance matrix and where tt € 
(0,1) and /j, e K\{0}. Under the model, the only independent component having 
non-zero skewness or kurtosis is the first one, meaning that only the first row of 
the unmixing matrix is identifiable (up to sign). However, this is enough as the 
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Figure 2. The optimal choices of weight a for different values of 
TT and IjL. 


first component carries all the information needed for the group separation, the 
remaining components can be considered just as noise. 

We use the projection pursuit approach to estimate the direction w of Fisher’s 
linear discriminant subspace. Note next that ■ji = ki = 0 for I > 1 and the 
asymptotic variances of the estimated elements wu, I > 1, are the same for the 
deflation-based and symmetric projection pursuit. Thus to choose the optimal 
weighting for group separation we want to minimize the variance 




9ct^Ciifc + 16(1 - a)^C22fc + 24a(l - a)Ci2fc 
(3a7fe +4(1 - a)4)2 


for k = 1, where Ciife, C 22 fe and Ci 2 fc are as in Corollary 4.1.1. 

Using the function optimize in R (R Core Team, 2014) we searched the global 
minimum of / for three choices of /i = 2, 5,10 and the results are shown in Figure 2 
(we only need to consider the interval tt S (0,0.5] due to symmetry). First, the plots 
show that the choice of p, has hardly any effect on the optimal value of a. Secondly, 
we see two discontinuity points, namely tt = (3 -I- -s/S)”^ =: tq and tt = 0.5 which 
arise due to excess kurtosis and skewness respectively vanishing in those particular 
values of tt. And thirdly, we observe that the curve goes to zero when approaching 
the point ttq from the right. This counterintuitively suggests using only kurtosis 
even though ki « 0 in the vicinity of ttq. However, a careful examination shows 
that when it = ttq + e for some small e > 0, the function / indeed has a global 
minimum near zero but it also has limQ,_j.o+/(a) = oo. Thus for practical purposes 
the global minimum might be too close to zero to be of any use. 

Hence, based on the above considerations, we thus suggest a = 0.8 as a good, 
all-around weight for use in group separation of normal mixtures. This choice is 
further supported by the fact that it corresponds to the weights given to squared 
skewness and squared excess kurtosis in the classical Jarque-Bera test statistic for 











20 


THIRD AND FOURTH CUMULANTS IN ICA 


normality ( 71 / 6 ) 7 ^ + (n/24)«;^ (Jarque and Bera, 1987) (under normality, ( 71 / 6 ) 7 ^ 
and ( 77 / 24 ) are independent and both have a chi-squared distribution with one 
degree of freedom). This corresponds also to the effective value derived in Jones 
and Sibson (1987). 

6.2. Comparison of asymptotic variances in IC models. Due to the affine 
equivariance of the estimates, it is sufficient to consider the behavior of W only 
in the case O = Ip. For all estimates, y/nvec{W — Ip) -^d -^p2(0,T) and the 
comparison should then be made using the asymptotic covariance matrix T. Then 
tr(T) is 

p p p P 

EE ASV (wki) = ASV {wkk) + E E {ASV (wki) + ASV {wik)) 

k=l 1=1 fc=l fc=l l=k+l 

where X]fc=i ■^SV{wkk) is the same for all estimates. As in Miettinen et al. (2015) 
we then use the values ASV{wki)+ASV{wik) for the comparison of the estimates for 
different choices of fcth and Ith marginal distributions. Surprisingly, for all deflation- 
based and symmetric projection pursuit estimates, this criterion value depends only 
on the fcth and Zth marginal distributions. For the estimates that use compound 
cumulant matrices, we use ASV{wi 2 )+ASV (w 2 i) with p = 2 and the same marginal 
distributions, as it is in fact the lower bound of ASV{wki) + ASV{wik) if all other 
components are symmetric. 

As the asymptotic variances of the symmetric projection pursuit approach and of 
the approach based on all cumulant matrices are the same (with adjusted weights), 
we in fact have three different methods to compare, namely, deflation-based PP, 
symmetric PP and the estimate based on compound cumulant matrices. For each 
method we distinguish the versions using third cumulants only (a = 1), fourth 
cumulants only (a = 0), and third and fourth cumulants with the weight a = 0.8. 
The fcth and Zth marginal distributions are standardized versions of the exponential 
power distribution EP(a) or Gamma(a) with densities 

!{z) = or !{z) = z > 0, 

with Ki,K 2 > 0 and positive shape parameter a. The asymptotic variances (and 
their lower bounds) then depend on the marginal distributions only through their 
shape parameters a For more details on the distributions in a similar study see 
Miettinen et al. (2015). The values of ASV{wki) + ASV{wik) for different combi¬ 
nations of families and parameters are shown in Figures 3 and 4. We do not report 
the results in cases where both components come from the symmetric exponential 
power family. In this case, the asymptotic variances of the estimates with 0 < a < 1 
are the same as the asymptotic variance of the estimate with a = 0 and the results 
for a = 0 are already given in Miettinen et al. (2015). In figures, a darker shade 
indicates a larger value so that the performance of a particular method is at its 
best in the areas of lighter color. 

From the contour plots we see that in the cases considered the performances of 
the estimates based on compound cumulant matrices are clearly the worst. One 
reason for this is, that none of them permit two sources having exactly the same 
distributions, causing the darker shades in the diagonals of Figure 4 (see the As¬ 
sumptions 5, 6 and 8 in Section 3). It also seems that the symmetric projection 
pursuit (and the multiple cumulant method) gives the best performance, although 
the deflation-based methods do not come far behind. 
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Skewness 


Kurtosis 


Both 



Figure 3. Contour plots of ASV{wi 2 ) + ASV{w 2 i) for different 
combinations of methods and cumulants used when the cc-axis in¬ 
dependent component has an exponential power distribution and 
the y-axis independent component has a gamma distribution. 


Note, that the symmetry of exponential power distribution is evident in the 
upper two plots on the left-hand side of Figure 3 where the sum of variances depends 
clearly only on the properties of the gamma distribution. The same two plots also 
showcase the fact that in the bivariate case when exactly one of the independent 
components is symmetric, both skewness-based projection pursuit methods have 
the same asymptotic behavior (as measured by the sum of off-diagonal asymptotic 
variances). The same would also hold for kurtosis, as can be verified by inspecting 
the results in Corollaries 4.1.1 and 4.2.1. 

7. Discussion 

In the previous sections, four different approaches for solving the independent 
component problem were thoroughly discussed. Each method was first precisely 
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Figure 4. Contour plots of ASV{wi 2 ) + ASV{w 2 i) for different 
combinations of methods and cumulants used when both indepen¬ 
dent components come from gamma distributions. 


defined and then had its affine equivariance proven and estimating equations and 
algorithms provided, and finally the methods’ asymptotic properties were derived. 
The main novelty in this paper is the combination of third and fourth cumulants in 
ICA where the weight given to skewness (or kurtosis) can be considered a tuning 
parameter. The special case of giving all weight to kurtosis yields then in the 
corresponding cases the deflation-based FastICA and the classic FOBI. Whereas 
the novel symmetric approach then gives perhaps a more natural version of the 
currently used symmetric FastICA approach. 

The most surprising result here is the similar asymptotic behaviors of the sym¬ 
metric projection pursuit and the method based on all cumulant matrices (includ¬ 
ing JADE). Note that the squared symmetric projection pursuit is computationally 
much lighter than JADE, and could thus possibly replace the use of JADE in many 
applications. Following this discovery, a justifled question to ask is whether moving 
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from the absolute values to squares provides better results also in the general case 
of symmetric FastICA. This will be considered in a separate paper. Another sur¬ 
prising result was that the compound cumulant approach needs special treatment 
to obtain affine equivariance when combing third and fourth compound cumulant 
matrices. Although the price to pay for this seems relatively low as just a few 
stronger assumptions are needed. However as our comparisons indicate, this ap¬ 
proach in general seems to be inferior to all other methods discussed here and its 
main advantage is its computational simplicity. 

In the comparison section we established that all the methods can also be used 
successfully in cluster identification in the case of a multivariate normal mixture. 
Additionally, when using the projection pursuit methods for such a goal, a good 
rule of thumb for the choice of weights for squared skewness relative to squared 
kurtosis is to give 80% of the weight to squared skewness, or alternatively, giving 
equal weights to standardized squared skewness and standardized squared excess 
kurtosis. This weighting then also coincides with the weighting used in the classical 
Jarque-Bera test of normality based on the same momentary quantities. 

Finally, it is interesting that although all the methods considered are defined 
very differently from each other, the corresponding expressions for the asymptotic 
variances in Corollaries 4.1.1, 4.2.1, 5.1.2 and 5.2.1 exhibit pleasing symmetry. 
Based on this pattern one could even make a highly educated guess on what the 
asymptotic properties of the even higher moment versions of the methods would be 
(assuming that the methods actually exist). 
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Appendix 

Proof of Theorem 4^. 1.1. Note first, that under the assumptions of the independent 
component model the following two identities hold. 

p p 

7 (u^z) = ^ Mfc7fc and k(u^z) = ^ uj.Kk. 

k=l k=l 

Then, by using the Cauchy-Schwarz inequality and the fact that uj, < 1, Vfc we have 

ai7^(u’^z) -I- a2K^(u'^z) 

<«! + “2 XI 


k=l k=l 

P 



k=l 


from which the result follows. □ 

Proof of Theorem 4-1-2. We begin by proving the consistency of the estimator and 
due to the affine equivariance of the squared deflation-based projection pursuit func¬ 
tional W, we may without loss of generality restrict our attention to the case = I 
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(this holds true for all the methods considered). Note then that the population and 
sample objective functions are of the forms 

Diu) = Wj {E [gj{u^z)]f and Dr,{u) = H 

j=i V” ^=l 

where Wj are the weights given to the functions gj. Consequently 

,7 

sup \D{u) — Dn{u)\ < 2^'^j 

uT’u=l uT’u=l 


{E [gj(u^z)])"- 



The difference of squares then factorizes into form {Gj + Gj){Gj — Gj), where the 
first factor (for our choices of gj) converges to finite constant due to the assumption 
on finiteness of moments and for the second factor we can use the uniform law of 
large numbers. As our choices for the functions gj are continuous and the set 
{u G = 1} is compact, we then have sup^Tu^i|D(u) — £)„(u)| —7p 0. 

Now D{u) has the unique (up to sign) maximizer ei, and applying the technique 
used in the proofs of Miettinen et al. (2014c), the above uniform convergence in 
probability implies the consistency of step 1 (up to sign), P(|lwi — ei|| < e) —0. 

For the convergence of step 2, we follow in the vein of Miettinen et al. (2014c) and 
move to the orthogonal complement of the span of Ui and consider the functions 
D 2 i'v) := Il(Ev) and Il 2 ,n(v) := Il„(Ev), where v G E = (e 2 ,...,ep) G 

IR^’x(p-i) and E is chosen as the closest matrix to E with respect to matrix norm 
such that the matrix (lii, E) is orthogonal. Note that this is not restricting as E 
and E are bases of e]*- and u^, respectively. The consistency of Ui also implies 
E -^p E. 

Similar reasoning as used above with D{u) and Zl„(u) in conjunction with the 
following convergence implied by the results in Randles (1982) and the finiteness 
of moments and differentiability of our choice of functions gj, 


1 

n 




5 j(v-' E-' VLstj) -^P E gj{-v^ E z) 


can be used to prove the convergence, supyTv^i|i42(v) —Zl 2 ,n(v)| —J^p 0. Observing 
then that D{w) has the unique (up to sign) maximizer ei, the arguments used for 
Ui then show that v —^p ei and consequently U 2 = Ev —)-p e 2 . Using similar 
constructions for k = 3, ...,p — 1 we get the consistency of the estimator up to 
sign-change, that is W —7p Ip. 

For the asymptotic behavior of W we then consider the diagonal and off-diagonal 
elements of W separately, and starting with the diagonal elements we first establish 
the following Lemma. 


Lemma Al. Assume that W= US where y/n{S—Ip) =Op{l), ^/n{U—Ip) = 

Op{l) and U G U. Then the following three hold. 

Vn{wkk - 1) = -^Vniskk - 1) + op(l), 

Vnwki + y/nwik = -\/nSki + Op(l), 

y/nwki = Vnuki - ^Vnski + op(l). 


k=l, ...,p, 
k I = 1, ...,p, 
k I = \, ...,p. 
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To prove Lemma Al consider first the following identity. 

Op = - Ip)S-^/2 - Ip) + v^(S-i/2 - Ip) 

= v^(S - Ip) + 2 v^(S-i/ 2 - Ip) + op(l). 

For the second equality above, note that •\/n(S — Ip) = —— Ip)(S + S^/^) 
which implies that ■\/n(S~^^^ — Ip) is bounded in probability, thus allowing us to 
conclude the identity — Ip) = —(l/2)-yn(S — Ip) + Op(l). 

Using similar techniques one can prove that y/n(U^ — Ip) = —y/n(lJ—Ip) + op(l) 
and yn(W — Ip) = ^/n(U — Ip) + — Ip) + op(l). As a consequence of 

these we then get the third claim of the lemma. 

Consider then the sum of W with its transpose W + 

Vn(W + - 2Ip) = v^(U - Ip)S-i/2 ^(s-i/2 _ 

+ - Ip)U^ + v^(U^ - Ip) + op(l) 

= -v^(S - Ip) + op(l), 


from which the first two claims follow. 

For the asymptotic behavior of the off-diagonal elements we require in the current 
proof and the proof of Theorem 4.2.2 the following estimators. 


hsk = 


hik — 


^(w^z^)^ 

1 

Tsfc = -^(WfcZ,)2z, 
Tl ' ^ 


2=1 

n 

2=1 

1 _ ” 

T4fe = -^(w^Zz)^Z, 
2=1 


satisfying /igfc ^-p 7 fe,h 4 fe ->p Kk^T^k Ik^k and T 4 fe ->p /Sfee^. Note then, 
that in terms of W = (wi,..., Wp)^ = the estimating equations have the 

form 

k 

ffc = S(^WjwJ)ffc, 
t=i 

where F/c = 3 ah 3 feT 3 fe-|- 4 (l— 0 ) 714 ^X 4 ^. Then, using Equation (5) from Nordhausen 
et al. ( 2011 ) we get the identity 


k 

(2) 3k^/n{tk -Tfeefc) =rfc[Vn(S - Ip)efc + - ej) 

i=i 

-|-V^(wfe - Gfe)] -I- Op(l), 


where 3k = J2j>k ^^d F^ = Sayf -I- 4(1 — a)Kkl3k- Next, using Equation (3) 
from Nordhausen et al. (2011) separately for T^k and 'T/^k gives the following two 
identities. 


(3) v^(T3fc - Ik^k) = Vnvk - 2efcefc ^/nz -h 27 fcE'"''^/n(wfe - e^) -h op(l), 

(4) V^(T4fc - /3feefc) = Vnc[p - S-fkekel^/nz 

+ 3(Ip -|- (/3/c — l)E^^)-\/n(wfc — Gfc) -|- op(l), 

where ffc = (l/n) “ 1)^* and Qfc = {I/n) - 7fe)z*- Using Equa¬ 

tions (3) and (4) together with the fact that \/n{h 3 k'^ 3 k — ll^k) = IkV^i'^sk — 
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'fk^k) + lk\/n{h^k — 7 /c)efc + op(l) (and the analogy for T 4 fc) we get an alternative 
expression for y/n{Tk — TkBk) which can be substituted into Equation (2). Inspect¬ 
ing the result element-wise then yields the following two equations from which the 
asymptotic result follows. 

0 = \/nsik + Vnwik + Vnwki + op(l), I < k, 

and 


ia'yk'/nfki + 4(1 - a)iikiVnqki + iy/nwki) 

= (3a7fc-1-4(1 - a)Kfc/3fc)(v^Sifc + + op(l), l>k. 


□ 


Proof of Theorem 4.2.1. For the proof we require the following Lemma. 

Lemma A2. Let a p x p matrix U GU, b gRp and r G N, r > 2. Then 

p p p p 

KkUiihbi < bl. 

i^l k^l 1^1 k^l 

To prove Lemma A2 we first utilize the Cauchy-Schwarz inequality. 
tttuikuibkb, = ± <±±<k-^bi. 

i=l k=l 1=1 i=l \fc=l / k=l 

Then observing that gives the desired result. 

The inequalities of Theorem 4.2.1 then easily follow by first expanding the left- 
hand sides under the assumptions of the independent component model in (1) to 
yield 

P P P P P P 

u%lkli and EEE '^ik'^ilP^kk^l. 

i=l k=l 1=1 i=l k=l 1=1 

Then, for both cases, an application of Lemma A2 gives the desired result. 

□ 

Proof of Lemma 4.2.2. The matrix form UT^ = TU^ of the estimating equations 
follows easily by element-wise inspection. This yields further (T^U)^ = T^T from 
which the result follows by first taking the symmetric square root of both sides. □ 

Proof of Theorem 4.2.2. For the consistency, the uniform convergence in probabil¬ 
ity of the sample objective function to the population one follows easily from the 
proof of Theorem 4.1.2 as the objective functions L^(U) and Z1„(U) are now just 
sums of the individual objective functions of the squared deflation-based projection 
pursuit. The desired result P(||W — Ip|j <e)—>'l,Ve>0, is then proven similarly 
as in Miettinen et al. (2014b) 

For the asymptotic behavior, Lemma Al takes care of the diagonal elements so 
we will only need to consider the off-diagonal elements. The sample versions of the 
estimating equations for k,l = l,...,p are 

(5) Sahsfcwf Tgfc -h 4(1 - a)h4k\vf T4k = -k 4(1 - a)/i4iW^T4;, 
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where /isfc, /i 4 fc, and are as in the proof of Theorem 4.1.2. With a approach 
similar to the one used in the proof of Theorem 6 in Miettinen et al. (2015) we have 
that 

= 7 zv^(wfc - efc)'^ 7 zez + Jlel^/n{T3l - jiei) + op(l), 

and 

Vn/i4zWfcT4/ = Ki^{wk - ekYPiei + - j3iei) + op(l). 

Substituting Equations (3) and (4) into the above expansions and then using the 
symmetry of the estimating equations in (5) gives the following identity. 


3a{jl^/nwik + 'ykVnfki) + 4(1 - a){/3kKky/nwik + KkVnqki + SnkVnwki) 

\/nwki + "fiVnfik) + 4(1 - a){/3iKi^/nWki + Ki^/nqik + “ini^/nwik) + op(l), 

from which the asymptotic result then follows using the second identity of Lemma 
Al. □ 


Proof of Theorem 5.0.3. Evaluating at Xst = U z yields 

C3*(x,i) = IJ^E [(z^Ue,)zz^] U = nfc,C3'=(z)^ U, 


\k^l 


which proves the theorem for C^(xst) also. Similarly, after some simplification, we 
have for fourth joint cumulants 


C4*^(x,t) = Uk^UlJB’^^{z) - 6,,Ip - u,uj - u,uf U 


k,l 

P 


= U^ [J2uk^UkJKkB^'^ U, 


\k^l 


where u/j are columns of U. This then gives the result for C^(xst) also. 


□ 


Proof of Corollary 5.1.1. Observe first, that without loss of generality, we may in 
both cases assume that the non-zero weight is equal to 1. Starting with the third 
cumulants, we have under the independent component model C^*(z) = 7 iE“. Then 
from the proof of Theorem 5.0.3 we have that 

C3*(x*j) = C3*(U*U^z) = U*U'^D,UU*'^, 

where the matrices D^, i = l,...,p are diagonal. This in turn implies that the 
matrix C^(x*() has 

= U*U^ 

i=l 

the last line of which is the eigendecomposition (diagonalization) of the matrix 
C^(x*j). Thus choosing a = 1 in the optimization problem of Definition 5.1.1 leads 
to this same diagonalization and gives the transformation x*^ UU*^xJ( = z. 

For the corresponding proof for fourth moments and the FOBI-matrix, E [x*jX*^x 
denote U*U^ = V G U and see for example Miettinen et al. (2015). The same 
result for C‘^(x*f) = E [x*(X*^x*(Xjf] — {p + 2)Ip then instantly follows □ 


c^(x:*) = 






-i=l 
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Proof of Theorem 5.1.1. We begin by proving the consistency of the estimator. 
Again we first need to show that the sample objective function converges uni¬ 
formly in probability to the corresponding population statistic over lA. For both 
the compound cumulant and multiple cumulant methods the objective functions 
are of the form 

■J p J p 

D{V) = '^Wjj2{ujMjUdf and Dr,(U) ='^Wjj2{u^MjUdf, 

j = l d=l j=l d=l 

where and are the sets of matrices to be diagonalized and their 

sample versions, and Wj are their respective positive weights. It is thus sufficient 
to consider individual supremums of the form 

5'(M,M)= sup |(u^Mu)2 - (u'^Mu)2|, 

u'^u=l 

where M and M are the population and sample version of an arbitrary matrix to 
be diagonalized and thus satisfy M — >p M. The assumptions on the finiteness 
of moments further ensures that ||M -f M|| converges in probability to some finite 
constant and we thus have 

S'(M,M)= sup f|u'^(M-HM)u| • |u'^(M-M)u|) 
u'ru=l ^ ^ 

< sup (||u^||||M-hM|l|lu|l) sup (||u^|||lM-M|l|lu||) 

= ||M-hM|1||M- M|| 

—j-p 0. 


Using then the obtained result, supugiYl'^(U) — A*r!,(U)| —tp 0, the consistency 
of the estimator, that is P(||W — Ip|| < e) —?> 1, Ve > 0, is proven similarly as in 
Miettinen et al. (2014b). 

Next, concerning the asymptotic behavior of W, we again without loss of gener¬ 
ality assume $7 = Ip, and then for diagonal elements it again suffices to use Lemma 
Al. To find the asymptotic behavior of the off-diagonal elements of W we in turn 
utilize the following lemma from the supplementary material of Miettinen et al. 
(2015). 

Lemma A3. Assume that Sk,k = 1.,..., K are py.p matrices such that y/n{Sk —A.^) 
are asymptotically normal with mean zero and Ak = diag{Xki,..., Xkp)- Let U = 
[ill, ■Wp)^ orthogonal matrix that maximizes 

K 

Y,\\diag[USkiF)f. 

/c=l 


Then 


Vnuij = 


Xk.j^^/n\Sk\i 


+ 0p(l), if^j. 


Due to the weighting, the matrices we diagonalize are in terms of Lemma A3 
now actually y/aCA and \/l — aCU Write then V = U*S- so the estimated 
unmixing matrix has the form W = UV and because ^/n(U* — Ip) = Op(l), also 
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y/n(V — Ip) = Op{l) holds. Based on the orthogonality of U* and the results of 
Lemma Al, we then have the following two equalities for k ^ I = 1, 

Vnwki = Vnuki + Vnvki + op(l), 

-'Jnski = Vnvki + ^/nvik + o_p(l) 


Using then Lemmas Al and A3, the above and the fact that C^(z) and C^(z) 
are diagonal we have for the compound cumulant method 


y/nWkl = 


aSikiy/n (cli + 5ikiVk?j + (1 - a)52kiVn + d2kiVki^ 


0‘^lki + (1 - <^)^lkl 


op(l)i 


where 6iki = {"fk - li) and 52ki = i^k - ni). By slightly modifying the proof of 
Theorem 8 in Miettinen et al. (2015) we can get the behavior of the FOBI-matrix 
B with the standardization functional V, namely 


v^(B - A) = 


V- 


n 

2=1 


z,zfV^Vz,zf 


- A 


where A = diag{K,i ,..., Up) + (p + 2)Ip and the inner mean, denoted in the following 
by T, converges in probability to the same constant as the matrix S 4 in the proof 
of Theorem 8 in Miettinen et al. (2015), namely, to A. We hence have 

V^(B - A) = v^(V - Ip)A + Ay/^{V^ - Ip) + v^(t - A) + op(l), 
where an arbitrary off-diagonal element of the last term is 

^ n 1 ^ 

E Zikif ^/n{V'^V - lp)i^Zii + ZikiJ iiZa- 

. hn t ^ 


For the behavior of the latter sum we consult Miettinen et al. (2015) and for the 
first sum, expanding it gives 

1 " 

- ^ z^k^Vn{V^V - Ip)i^hl = Vniy^Y - lp)ki + v^(V^V - lp)ik + op(1), 
2=1 

where the matrix ^/n(Y'^'V — Ip) can be further expanded as — Ip) -|- 

y/niY — Ip) -I- op(l). Putting then everything together we have for an off-diagonal 
element of the FOBI-matrix B (and consequently for an off-diagonal element of 
C4 = B - (p -h 2)Ip) that 

p 

Vnbki = Vnqki + Vnqik + ^ VnQmki 

m^k,l 

+ {ki + P + -i)y/nvki + {Kk+p + ■i)^/nvik + op(l). 

In terms of Lemma A3 we then get 

p 

(yh + S2kiVki^ = Vnqki + Vnqik + L! Vnqmki 

+ Vk + p + 4:){VnVki + Vnvik) + op(l). 


where the effect of the standardization functional V vanishes as Vni’u + Vnvik = 
-Vnski + op(l). 
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For the corresponding result for the matrix we first define some notation. 
Let V —>p Ip denote an arbitrary standardization matrix and let 

1 " 

Hfc = Hfc(V) := - V(zfV^efe)z,zf = + op(1), 

n 

1 "" 

^klm / SikZilZim — 

n 

where Nkim additionally satisfies y/nNkim = (l/v^)I]Li ^ikZuZim - Skiy/nzm - 
SkmV^zi — Simy/nzk + Using then the above and expanding each of the 

matrices V as V = (V — Ip) + Ip we write 


C3 = (V - Ip) Hfc j (W - Ip) + (V - Ip) H, j 

+ (y]Hj (W-ip)+y]Hfe. 

\fe=l / k=l 

Using Slutsky’s theorem this further yields 

V^(C3 - ^ v^(V - Ip)7feE'='^' + 7feE'='=v^(W - Ip) 




/c=l 

P 


/c=l 


+ ^V^(Hfe-7fcE'='=) + op(l). 




Inspecting the result element-wise and using for the last sum the expansion {'H.rn)ki = 
J2u^i '^muNkiu, it easily follows that the element (A:,/) of y/n&, k ^ I, satisfies 

p p 

VnCh = jiy/nvki + 'ykVni’ik + EE VnVmuNklu + Op(l). 

m—1 u—1 

The term consisting of the double sum can further be expanded as 
^ ^ '\/^{'^mu ^mu')^ku^lu'^u T ^ ^ '^/'^^klm T ^p(l); 

m,u m 

the first sum of which vanishes as k ^ I, leaving only the second sum, which after 
simplifying has the form 

Vnfki + Vnfik+ y] y/nfmki- 

m^k,l 

We then have in terms of Lemma A3 

Vn (^Cli + SikiVki^ = Vnfki + Vnfik + + 7fc(v^Vfcz + V^^ifc) + op(l), 

m^k,l 

where the effect of V again vanishes giving then the desired result. 

Note, that in the proof we made no assumption whatsoever on the origin of the 
orthogonal matrix U* and thus any choice of IC functional in the standardization 
leads to the same asymptotic behavior for the estimate W. □ 
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Proof of Theorem 5.2.1. As with the affine equivariance of the compound cumu- 
lant method in Definition 5.1.1, we again carry out the proof by showing that the 
optimization problem in Definition 5.2.1 is invariant under mappings Xgt i-A Vx^^, 
where V GU. 

We first divide the objective function in two parts 

D{XJ,Xst) = aD3{XJ,Xst) + (1 - a)D4{\J,Xst), 

where £> 3 ( 11 ,x^t) = J 2 i=i denotes the part based on third 

cumulants and £) 4 (U,Xst) = IM*a 5 (UC^*'^(xst)U^)|p respectively the 

part based on fourth cumulants. From the proof of Theorem 9 in Miettinen et al. 
(2015) we have that Zl 4 (U,Vxst) = £) 4 (UV,Xs*) and to complete the proof we 
thus need the analogical result for £>3. 

From the proof of Theorem 5.0.3 we see that 

C3*(Vx,t) = j 2 v^kVE [(xf,e,.)x,*xj,] 

k^l 

fc=i 

Denoting W = (wi,..., Wp)"^ := UV and substituting into £> 3 ( 11 , Vxst) we then 
have 

D3(U,Vx«t) =EE ('E^*fcVC3'=(x,t)V^] u, 

i—1 d—1 \ \/c^l / 

= EEEE 'y*fc1'Hc'wJC^'"(Xs4)wdwJC^'' (Xst)Wd 

z—1 d—1 k—1 k' — l 

= EEEwJ C d^d ^ {p^st)'^d ^ '^ik'^ik' 

d—lk—lk' — l \z—1 

= EE 

d—1 k—1 

= D3(UV,x,t). 

Combining this with the result for D 4 we have thus shown that D(U,Vxst) = 

D(UV,x,t). □ 


Proof of Theorem 5.2.1. For the consistency of the estimator W, see the proof of 
Theorem 5.1.1. 

The asymptotic behavior of diagonal elements is covered by Lemma Al and for 
the off-diagonal elements we use Lemma A3 which, noting that C^*(z) = 7 iE“ and 
C'^*'^(z) = SijKiEf^, in conjunction with Lemma Al now gives 


Vnwki 


aM^ -I- (1 — a)M4 

«(7fc + if) + (1 - a)(Kfc + i^f) 


-Vnski + op(l). 


where M 3 = — ^ly/nCfi and M 4 = K^y/nCf^^ — Kiy^nCff. Notice again, 

that as in the proof of Theorem 5.1.1, we again apply Lemma A3 to matrices scaled 
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by the square roots of the weights. We obtain the behavior of fourth cumulants 
from the proof of theorem in Miettinen et al. (2015). 

^4 + Kf)y/nski = KkVnqu - mVnqik - iK/3k - 3Ki)y/nSki + op(l). 

To derive the counterpart for third cumulants we again denote the standardiza¬ 
tion matrix by V —^-p Ip. With a technique similar to the one used for matrix 
in the proof of Theorem 5.1.1 we get 

= V^(V - Ip)7fcE“ + 7fcE'='=v^(V^ - Ip) 

+ y7^(Hfc-7feE'='=)+op(l). 

Inpsecting the equation element-wise and again using the fact that (Hm)^/ = 
^muNkiu (see the proof of Theorem 5.1.1) we then get 

= y/nCf^ = ^kVnvik + VnNkki + op(l), 

which further yields 

^3 - ^(7fe + li)Vnski = 'ykVnfki - liVnhk - 'ylVnSki + op(l). 

□ 
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